Combining human analysis and machine data mining to obtain credible data relations
نویسندگان
چکیده
Can a model constructed using data mining (DM) programs be trusted? It is known that a decision-tree model can contain relations that are statistically significant, but, in reality, meaningless to a human. When the task is domain analysis, meaningless relations are problematic, since they can lead to wrong conclusions and can consequently undermine a human’s trust in DM programs. To eliminate problematic relations from the conclusions of analysis, we propose an interactive method called Human–Machine Data Mining (HMDM). The method constructs multiple models in a specific way so that a human can reexamine the relations in different contexts and, based on observed evidence, conclude which relations and models are credible—that is, both meaningful and of high quality. Based on the extracted credible relations and models, the human can construct correct overall conclusions about the domain. The method is demonstrated in two complex domains, extracting credible relations and models that indicate the segments of the higher education sector and the research and development sector that influence the economic welfare of a country. An experimental evaluation shows that the method is capable of finding important relations and models that are better in both meaning and quality than those constructed solely by the DM programs. 2014 Elsevier Inc. All rights reserved.
منابع مشابه
Detecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملDetecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملForecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data
Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...
متن کاملSearching for Credible Relations in Machine Learning
When machine learning (ML) and data mining (DM) methods construct models in complex domains, models can contain less-credible parts [2], which are statistically significant, but meaningless to the human analyst. For example, let us consider a decision tree model presented in Figure 1. The tree is constructed with the J48 algorithm in Weka [8] for a complex domain indicating which segments of re...
متن کاملThe machine learning process in applying spatial relations of residential plans based on samples and adjacency matrix
The current world is moving towards the development of hardware or software presence of artificial intelligence in all fields of human work, and architecture is no exception. Now this research seeks to present a theoretical and practical model of intuitive design intelligence that shows the problem of learning layout and spatial relationships to artificial intelligence algorithms; Therefore, th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 288 شماره
صفحات -
تاریخ انتشار 2014